File Uploads + media_server app #33

ormsbee · 2023-02-16T21:41:38Z

Allow Content data to be stored as FileFields backed by django-storages, and serve those assets via a simple internal app.

I'm afraid I muddled this PR a bit. I needed to fix the dependencies, and I got that muddled in with my actual code changes (though they're in separate commits). I also ended up running black on the Python files I touched, meaning there are some unrelated formatting issues higher up in the files that got caught. I can break this up if it's too much to sort through.

From the commit message:

This alters the data model in the components app in a number of ways in
order to support static assets, along with some refactoring:

* Content has been renamed to RawContent to make it more clear when we
  are talking about "content" as a general concept and the actual data
  model that holds the raw bytes.
* RawContent now uses FileField instead of BinaryField, giving us more
  cheaply scalable storage in exchange for higher latency. This is
  offset by the new TextContent model that will be used to store the
  text versions of RawContent that needs low-latency access (like XBlock
  OLX).
* A primitive media_server app now exists to view static assets during
  development. It is NOT safe to use on a running site yet.

openedx_learning/core/components/models.py

ormsbee · 2023-03-17T19:05:31Z

The part I'm struggling with the most here are the asset download permissions. I had that encoded into the RawContent at one point, and then backed it out to the ContentVersionRawContent through model. But thinking on it more, it feels more like it should live outside the versioning workflow altogether–since if you accidentally left something public, you wouldn't want to leave old versions of it public when you lock the current version down.

bradenmacdonald · 2023-03-21T18:17:39Z

To what extent is learning core responsible for permissions?
Are we going with is_public content is CDN cacheable and public, and everything else requires a signed URL, which can be retrieved by (e.g. the LMS) and given to authorized users, but Learning Core doesn't know anything beyond that?

Edit: lol never mind, I see you answered that above.

openedx_learning/core/components/models.py

ormsbee · 2023-04-14T14:52:36Z

@bradenmacdonald, @kdmccormick, @feanil: Ready for review.

ormsbee · 2023-04-19T15:28:18Z

Rebased to resolve some dependency conflicts.

feanil

1 question about Django otherwise, I don't think I have any more questions/feedback at the moment.

feanil · 2023-04-21T14:41:09Z

requirements/base.in

@@ -1,15 +1,6 @@
 # Core requirements for using this application
 -c constraints.txt

-Django                # Web application framework
+Django<4.0                # Web application framework


What's the reason for not using a newer Django here? Because we'll be running it as an edx-platform library eventually?

Yeah. I haven't set up tox or anything yet to test across multiple versions, so I just wanted to make sure when I'm doing the dev locally that I'm not using any features that don't exist in 3.2.

bradenmacdonald

Just some very minor questions/comments. Looks great!

bradenmacdonald · 2023-04-21T20:41:53Z

olx_importer/management/commands/load_components.py

+                    raw_content=raw_content,
+                    text=data_str,
+                    length=len(data_str),
+                )


Should we check if len(data_bytes) < MAX_TEXT_LENGTH before trying to save the copy of the OLX into TextContent ? (store it on S3 only, but still allow it) Or do we want to always throw an error for such large OLX?

At this point I want it to throw an error. Anything that big is probably going to cause issues down the road, and I'm honestly curious where we actually have that in the wild. I suspect in most cases, it's because something unexpected and weird is happening, like copy-pasting from a Word doc and bringing the images over as base64-encoded HTML.

bradenmacdonald · 2023-04-21T20:48:04Z

openedx_learning/contrib/readme.rst

+Guidelines
+----------
+
+Nothing from ``lib`` or ``core`` should *ever* import from ``contrib``.


The isolated apps linter could enforce this for you if you wanted down the road :)

I do have some primitive import linting set up here, though I need to actually stop being lazy and hook up the CI to run it properly.

Oh awesome :)

bradenmacdonald · 2023-04-21T20:51:10Z

olx_importer/management/commands/load_components.py

+                raw_content.file.save(
+                    f"{raw_content.learning_package.uuid}/{hash_digest}",
+                    ContentFile(data_bytes),
+                )


Actually, at least for smaller OLX content, why to we need to save the file to S3 if we have it in-DB via TextContent? Especially if it's learner_downloadable=False, is there any use case for reading from S3? From the docstring on the models it sounds like this is just for overall consistency, but I'm wondering if there's a use case?

It's mostly consistency, and I thought it might make export simpler. I also thought that there might be some edge cases where we want to later promote RawContent to TextContent after the fact, and have them exist in both places–like video transcripts. But mostly I just like the idea that everything is minimally represented as an opaque blob, and there can be multiple levels of progressive enhancement of that data over time, e.g. XBlockContent, Video, etc.

openedx_learning/core/components/models.py

bradenmacdonald · 2023-04-21T23:11:33Z

openedx_learning/core/components/models.py

+    # is a part of hasn't started yet. That's a matter of LMS permissions and
+    # policy that is not intrinsic to the content itself, and exists at a layer
+    # above this.
+    learner_downloadable = models.BooleanField(default=False)


^ Thank you for this nice, clear, detailed docstring and field name 💯

openedx_learning/core/components/admin.py

This alters the data model in the components app in a number of ways in order to support static assets, along with some refactoring: * Content has been renamed to RawContent to make it more clear when we are talking about "content" as a general concept and the actual data model that holds the raw bytes. * RawContent now uses FileField instead of BinaryField, giving us more cheaply scalable storage in exchange for higher latency. This is offset by the new TextContent model that will be used to store the text versions of RawContent that needs low-latency access (like XBlock OLX). * A primitive media_server app now exists to view static assets during development. It is NOT safe to use on a running site yet.

The codecov dependency had to be replaced because it got pulled from PyPI. The DRF dependency somehow never made it in there before.

This was referenced Feb 16, 2023

File uploads + Experimental Media Server #31

Closed

Content data model should use File Storage. #29

Closed

ormsbee changed the title ~~File Uploads with media_server app~~ File Uploads + media_server app Feb 16, 2023

ormsbee self-assigned this Mar 3, 2023

kdmccormick reviewed Mar 9, 2023

View reviewed changes

openedx_learning/core/components/models.py Outdated Show resolved Hide resolved

ormsbee mentioned this pull request Apr 1, 2023

Refactoring Proposal: Rename Content model to RawContent #32

Closed

ormsbee commented Apr 1, 2023

View reviewed changes

openedx_learning/core/components/models.py Outdated Show resolved Hide resolved

ormsbee force-pushed the file_upload branch 3 times, most recently from d18dc2d to 3e50b18 Compare April 14, 2023 02:49

ormsbee marked this pull request as ready for review April 14, 2023 02:53

ormsbee force-pushed the file_upload branch from 3e50b18 to 8c8343b Compare April 19, 2023 15:27

feanil approved these changes Apr 21, 2023

View reviewed changes

bradenmacdonald approved these changes Apr 21, 2023

View reviewed changes

David Ormsbee added 2 commits April 21, 2023 22:39

chore: add DRF, remove codecov, rebuild requirements

6dc99b3

The codecov dependency had to be replaced because it got pulled from PyPI. The DRF dependency somehow never made it in there before.

ormsbee force-pushed the file_upload branch from 8707e98 to 6dc99b3 Compare April 22, 2023 02:40

ormsbee merged commit 5390dd4 into openedx:main Apr 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File Uploads + media_server app #33

File Uploads + media_server app #33

ormsbee commented Feb 16, 2023 •

edited

Loading

ormsbee commented Mar 17, 2023 •

edited

Loading

bradenmacdonald commented Mar 21, 2023 •

edited

Loading

ormsbee commented Apr 14, 2023

ormsbee commented Apr 19, 2023

feanil left a comment

feanil Apr 21, 2023

ormsbee Apr 21, 2023

bradenmacdonald left a comment

bradenmacdonald Apr 21, 2023

ormsbee Apr 22, 2023

bradenmacdonald Apr 21, 2023

ormsbee Apr 22, 2023

bradenmacdonald Apr 22, 2023

bradenmacdonald Apr 21, 2023

ormsbee Apr 22, 2023

bradenmacdonald Apr 21, 2023

File Uploads + media_server app #33

File Uploads + media_server app #33

Conversation

ormsbee commented Feb 16, 2023 • edited Loading

ormsbee commented Mar 17, 2023 • edited Loading

bradenmacdonald commented Mar 21, 2023 • edited Loading

ormsbee commented Apr 14, 2023

ormsbee commented Apr 19, 2023

feanil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bradenmacdonald left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ormsbee commented Feb 16, 2023 •

edited

Loading

ormsbee commented Mar 17, 2023 •

edited

Loading

bradenmacdonald commented Mar 21, 2023 •

edited

Loading